-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CosemDataset object-oriented api #42
base: main
Are you sure you want to change the base?
Conversation
btw ... looks like tests are only running against |
@tlambert03 I love the idea! Thanks for putting that together. If you are willing to iterate on this a bit, it might be nice to consider using datatrees to represent collections. I will definitely look through the PR more thoroughly in the next few days as for ci... thanks for reminding me. getting that cleaned up and working has been on my to-do list for a while, and docs too. PRs are welcome, but I can also just sort it out myself this week. |
yep, more than happy to play around with it. After taking a quick look, I think my first question is probably what exactly should I be representing as
do you want to sketch out roughly what structures you imagine for each? Would there be a single root Datatree representing the full bucket? datasets for each cosem dataset, and arrays for each source/scale combo? (Or would each cosem itself be a Datatree) |
def images(self) -> dict[str, ImageDict]: | ||
return self.metadata["images"] | ||
|
||
def read_image(self, image_id: str, level: int = 0) -> xr.DataArray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be a place to return a datatree
, e.g. via something like this:
def multiscale_datatree(group_url: str) -> DataTree:
group = read(group_url)
# this sorts on the key name, but a better approach is to sort by descending array size
array_keys = sorted(
tuple(name for name, _ in group.arrays()), key=lambda v: int(v[1:])
)
arrays = {
arr_name: read_xarray(os.path.join(group_url, arr_name))
for arr_name in array_keys
}
return DataTree.from_dict(arrays, name=group.basename)
Then you don't need to worry about validating the 'level' kwarg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and maybe rename the method that returns the arrays to .images
, and use the name image_meta
for the method that just returns image metadata?
return dataset_names() | ||
|
||
@staticmethod | ||
def all_thumbnails(ncols: int = 4, **kwargs) -> plt.Figure: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possible alternative name: thumbnail_gallery
, or just gallery
) | ||
|
||
@staticmethod | ||
def all_names() -> list[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we use a top-level Mapping
-like object where the keys are the names of datasets and the values are instances of CosemDataset
, we can replace this with the .keys
method on that object
top level: some kind of key-value interface, where the keys are dataset names, and the values are ~CosemDataset objects. Besides an instance of a dataset: I think the overall structure of the sources: after playing around with a quick hack, I think a A similar interface could be achieved by using One sticky issue that will bite us is that the current design of the supabase / postgres setup is bad -- I am exposing the structure of the raw database tables, but I need to introduce a layer of indirection so that I can make changes to the table structure without wrecking clients like this one here. I'm still mulling over how best to do that... |
Your data tree branch is fine too... would you prefer to just go with that and close this? |
i think my branch sucks! much better if we build off yours :) |
well, most of what we've got here is "extra" API ... but you raise a good point that it builds on the supabase schema, and perhaps we should minimize the extent to which we depend on that for now? Could be incrementally expanded in the future, and perhaps a simple key->Datatree map (dataset name to Datatree object) might be a good place to start here? It would be very nice to be able to load views too in some sense... but not sure what the best entry point for that is. my "dream" application here is to be using the neuroglancer cosem app to explore the data at various scales... maybe come across a field of view that you really like (that you want to do something with), copy the URL or some json state, pass it to a python function in this library and have an xarray object to work with, that includes all of the sources/channels that had loaded in the view, perhaps cropped to a specific region (etc...) but much of the details there could be hidden, keeping a relatively minimal user API until the database schema stabilizes a bit more? |
btw, more than happy for you to commit directly! |
Hey @d-v-b ... curious what you think about adding an object like this. I use this in a little pet project as slightly more interactive way to interact with cosem-datasets.
lemme know what you think, if it's something you'd want/consider including here, I can complete docstrings and clean it up a bit